Prevent .tar file corruption by patching short reads#261
Prevent .tar file corruption by patching short reads#261joost-j wants to merge 4 commits intofox-it:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #261 +/- ##
==========================================
+ Coverage 44.93% 45.20% +0.26%
==========================================
Files 26 26
Lines 3527 3568 +41
==========================================
+ Hits 1585 1613 +28
- Misses 1942 1955 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
57a9c5b to
5b22f2c
Compare
| if info.size is None: | ||
| shutil.copyfileobj(fh, self.tar.fileobj, bufsize) | ||
| return |
There was a problem hiding this comment.
I think we can remove this block since it would be an illegal action in this context.
| for _ in range(blocks): | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time | ||
| buf = fh.read(bufsize) | ||
| if len(buf) < bufsize: |
There was a problem hiding this comment.
I think you can generalize this case instead of doing it twice. Keep track of how many bytes you actually wrote (i.e. using .tell() and only pad once.
|
|
||
| blocks, remainder = divmod(info.size, bufsize) | ||
| for _ in range(blocks): | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time |
| self.tar.fileobj.write(buf) | ||
|
|
||
| if remainder != 0: | ||
| # Prevents "long reads" because it reads at max bufsize bytes at a time |
|
|
||
| info = copy.copy(info) | ||
|
|
||
| buf = info.tobuf(self.tar.format, self.tar.encoding, self.tar.errors) |
There was a problem hiding this comment.
You could make this even safer by truncating to the previous offset/tar member end if any exception occurs while writing.
Co-authored-by: Erik Schamper <1254028+Schamper@users.noreply.github.com>
|
Any idea when this is getting fixed? It's affecting me as well. Let me know if there's anything I can do to help! |
Fixes an issue where a
.taroutput file would contain inconsistencies with regards to expected and actual file size of the included files.In some cases, a file on disk can report a size of X bytes, but at the time of actually reading X bytes from the file, less than X bytes are actually available in the file (a short read). Acquire would report these issues as
OSErrorin the resulting Acquisition log file, because the Python stdlibtarfile.pyhandles it that way. Data may however already be written to the destination archive at that point.Afterwards, Acquire continues to add new files to the archive. When trying to untar the file using
tar -xvf <FILE>this would show as atar: Skipping to next headererror and finally, the process exists with a nonzero exit code.Included a test case which simulates a file that actually returns less bytes than its reported size, to test this case.